primary model
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.97)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
CURE: Confidence-driven Unified Reasoning Ensemble Framework for Medical Question Answering
Elshaer, Ziad, Rashed, Essam A.
High-performing medical Large Language Models (LLMs) typically require extensive fine-tuning with substantial computational resources, limiting accessibility for resource-constrained healthcare institutions. This study introduces a confidence-driven multi-model framework that leverages model diversity to enhance medical question answering without fine-tuning. Our framework employs a two-stage architecture: a confidence detection module assesses the primary model's certainty, and an adaptive routing mechanism directs low-confidence queries to Helper models with complementary knowledge for collaborative reasoning. We evaluate our approach using Qwen3-30B-A3B-Instruct, Phi-4 14B, and Gemma 2 12B across three medical benchmarks; MedQA, MedMCQA, and PubMedQA. Result demonstrate that our framework achieves competitive performance, with particularly strong results in PubMedQA (95.0\%) and MedMCQA (78.0\%). Ablation studies confirm that confidence-aware routing combined with multi-model collaboration substantially outperforms single-model approaches and uniform reasoning strategies. This work establishes that strategic model collaboration offers a practical, computationally efficient pathway to improve medical AI systems, with significant implications for democratizing access to advanced medical AI in resource-limited settings.
- Asia > Japan (0.04)
- North America > United States (0.04)
- Africa > Middle East > Egypt > Giza Governorate > Giza (0.04)
- Health & Medicine > Diagnostic Medicine (0.68)
- Health & Medicine > Health Care Technology (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
CacheClip: Accelerating RAG with Effective KV Cache Reuse
Yang, Bin, Leng, Qiuyu, Zeng, Jun, Wu, Zhenhua
Retrieval-Augmented Generation (RAG) systems suffer from severe time-to-first-token (TTFT) bottlenecks due to long input sequences. Existing KV cache reuse methods face a fundamental trade-off: prefix caching requires identical prefixes that rarely occur in RAG scenarios, while direct precomputation sacrifices quality due to missing inter-chunk attention and repeated attention sinks. Recent methods like APE and CacheBlend partially address these issues but remain inadequate for robust RAG applications. This paper presents CacheClip, a novel framework that achieves both fast TTFT and high generation quality. Our key insight is that small auxiliary LLMs exhibit similar last-layer attention distributions to primary LLMs (the target model for generation), enabling efficient identification of tokens critical for restoring inter-chunk attention, thereby significantly improving response quality on cross-chunk reasoning tasks. CacheClip integrates three techniques: (1) auxiliary-model-guided token selection for selective KV cache recomputation, where the auxiliary model is finetuned to improve selection accuracy, (2) shared prefixes to eliminate redundant attention sinks, and (3) grouping strategy to maintain local coherence during partial KV cache updates. Experiments show CacheClip retains up to 94.8% and 85.0% of full-attention performance on NIAH and LongBench, outperforming APE and CacheBlend by 25.2% and 35.1% on NIAH (with reomp% = 20%). Meanwhile, CacheClip accelerates LLM inference by up to 1.92x in prefill time, providing a practical solution to the efficiency-quality trade-off in RAG systems.
Generalisation of automatic tumour segmentation in histopathological whole-slide images across multiple cancer types
Skrede, Ole-Johan, Pradhan, Manohar, Isaksen, Maria Xepapadakis, Hveem, Tarjei Sveinsgjerd, Vlatkovic, Ljiljana, Nesbakken, Arild, Lindemann, Kristina, Kristensen, Gunnar B, Kasius, Jenneke, Zeimet, Alain G, Brustugun, Odd Terje, Busund, Lill-Tove Rasmussen, Richardsen, Elin H, Haug, Erik Skaaheim, Brennhovd, Bjørn, Rewcastle, Emma, Lillesand, Melinda, Kvikstad, Vebjørn, Janssen, Emiel, Kerr, David J, Liestøl, Knut, Albregtsen, Fritz, Kleppe, Andreas
Deep learning is expected to aid pathologists by automating tasks such as tumour segmentation. We aimed to develop one universal tumour segmentation model for histopathological images and examine its performance in different cancer types. The model was developed using over 20 000 whole-slide images from over 4 000 patients with colorectal, endometrial, lung, or prostate carcinoma. Performance was validated in pre-planned analyses on external cohorts with over 3 000 patients across six cancer types. Exploratory analyses included over 1 500 additional patients from The Cancer Genome Atlas. Average Dice coefficient was over 80% in all validation cohorts with en bloc resection specimens and in The Cancer Genome Atlas cohorts. No loss of performance was observed when comparing the universal model with models specialised on single cancer types. In conclusion, extensive and rigorous evaluations demonstrate that generic tumour segmentation by a single model is possible across cancer types, patient populations, sample preparations, and slide scanners.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Europe > Norway > Eastern Norway > Oslo (0.06)
- Europe > Norway > Western Norway > Rogaland > Stavanger (0.05)
- (11 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Oncology > Prostate Cancer (0.48)
- Health & Medicine > Therapeutic Area > Oncology > Lung Cancer (0.46)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Speech Emotion Recognition via Entropy-Aware Score Selection
Chua, ChenYi, Wong, JunKai, Chen, Chengxin, Miao, Xiaoxiao
--In this paper, we propose a multimodal framework for speech emotion recognition that leverages entropy-aware score selection to combine speech and textual predictions. The proposed method integrates a primary pipeline that consists of an acoustic model based on wav2vec2.0 We propose a late score fusion approach based on entropy and varentropy thresholds to overcome the confidence constraints of primary pipeline predictions. Speech Emotion Recognition (SER), which aims to recognise emotions directly from voice inputs as discrete emotion classes [1], has become a crucial area of study in human-computer interaction, enhancing the emotional intelligence of virtual assistants, interactive robots, and mental health monitoring systems [2]. The rapid development of deep SER models, such as Convolutional Neural Networks (CNNs) [3], Recurrent Neural Networks (RNNs) [4], and Transformer-based architectures [5], [6], [7], has substantially improved recognition accuracy by capturing complex temporal and contextual patterns in speech.
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.54)
- Health & Medicine > Consumer Health (0.54)
Conformal Arbitrage: Risk-Controlled Balancing of Competing Objectives in Language Models
Overman, William, Bayati, Mohsen
Modern language model deployments must often balance competing objectives, for example, helpfulness versus harmlessness, cost versus accuracy, and reward versus safety. We introduce Conformal Arbitrage, a post hoc framework that learns a data driven threshold to mediate between a Primary model optimized for a primary objective and a more conservative Guardian which could be another model or a human domain expert aligned with a guardrail objective. The threshold is calibrated with conformal risk control, yielding finite sample, distribution free guarantees that the long run frequency of undesirable events, such as factual errors or safety violations, does not exceed a user specified quota. Because Conformal Arbitrage operates wholly at the API level, without requiring access to model logits or updating model weights, it complements weight based alignment techniques and integrates seamlessly with existing cost aware cascades. Empirically, Conformal Arbitrage traces an efficient frontier, allowing users to define an acceptable performance level for one objective while maximizing utility in another. We observe that our method outperforms, in terms of accuracy, cost matched random routing between models. These properties make Conformal Arbitrage a practical, theoretically grounded tool for trustworthy and economical deployment of large language models across a broad range of potentially competing objectives.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > New York (0.04)
Automated Real-time Assessment of Intracranial Hemorrhage Detection AI Using an Ensembled Monitoring Model (EMM)
Fang, Zhongnan, Johnston, Andrew, Cheuy, Lina, Na, Hye Sun, Paschali, Magdalini, Gonzalez, Camila, Armstrong, Bonnie A., Koirala, Arogya, Laurel, Derrick, Campion, Andrew Walker, Iv, Michael, Chaudhari, Akshay S., Larson, David B.
A rtificial intelligence (AI) tools for radiology are commonly unmonitored once deployed . Th e lack of real - time case - by - c ase assessments of AI prediction confidence require s users to independently distinguish between trustworthy and unreliable AI predictions, which increas es cognitive burden, r educ es productivity, and potentially lead s to misdiagnos e s. To address these challenges, we introduce Ensembled Monitoring Model (EMM), a framework inspired by clinical consensus practices using multiple expert reviews. Designed specifically for black - box commercial AI products, EMM operates independently without requiring access to interna l AI components or intermediate outputs, while still providing robust confidence measurements. Using intracranial hemorrhage detection as our test case on a large, diverse dataset of 2919 studies, we demonstrate that EMM successfully categorizes confidence in the AI - generated prediction, suggesting different actions and helping improve the overall performance of AI tools to ultimately reduc e cognitive burden . Importantly, we provide key technical considerations and best practices for successfully translating EMM into clinical settings .
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- (3 more...)
- Research Report > Experimental Study (0.68)
- Research Report > New Finding (0.46)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Nuclear Medicine (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Government > Regional Government > North America Government > United States Government > FDA (0.35)
Active Sampling for Node Attribute Completion on Graphs
Liu, Benyuan, Chen, Xu, Wang, Yanfeng, Zhang, Ya, Cao, Zhi, Tsang, Ivor
Node attribute, a type of crucial information for graph analysis, may be partially or completely missing for certain nodes in real world applications. Restoring the missing attributes is expected to benefit downstream graph learning. Few attempts have been made on node attribute completion, but a novel framework called Structure-attribute Transformer (SAT) was recently proposed by using a decoupled scheme to leverage structures and attributes. SAT ignores the differences in contributing to the learning schedule and finding a practical way to model the different importance of nodes with observed attributes is challenging. This paper proposes a novel AcTive Sampling algorithm (ATS) to restore missing node attributes. The representativeness and uncertainty of each node's information are first measured based on graph structure, representation similarity and learning bias. To select nodes as train samples in the next optimization step, a weighting scheme controlled by Beta distribution is then introduced to linearly combine the two properties. Extensive experiments on four public benchmark datasets and two downstream tasks have shown the superiority of ATS in node attribute completion.
Accurate and Reliable Predictions with Mutual-Transport Ensemble
Liu, Han, Cui, Peng, Wang, Bingning, Zhu, Jun, Hu, Xiaolin
Table 3 presents the performance results for various models in detecting misclassifications. Our method showed significant improvements over other single-model calibration techniques and the DE method. OOD Detection: A reliable classification model should exhibit higher prediction uncertainty and lower confidence when encountering test samples significantly different from the training data. We assessed different calibration methods' abilities to differentiate OOD samples by blending indistribution test data with OOD data. We assessed two capabilities of models trained on CIFAR-10 and CIFAR-100: far OOD detection and near OOD detection Fort et al. (2019); Hendrycks et al. (2019). Far OOD detection involved distinguishing between CIFAR-10 and SVHN datasets Netzer et al. (2011) for models trained on CIFAR-10, and between CIFAR-100 and SVHN datasets for models trained on CIFAR-100. Near OOD detection required distinguishing between CIFAR-10 and CIFAR-100 datasets, which have similar domains. The results, presented in Table 4, demonstrate significant improvements of our method compared to other single-model calibration methods, even surpassing the performance of the DE method, known for its effectiveness in OOD detection.